The data comes from a recent Government Hackathon called “GovHack 2017”. The URL from which this dataset can be downloaded from is https://data.gov.au/dataset/govhackato. Click on the link “GovHack 2017”. It contains 3 years worth of data ie the years 2006, 2011 & 2015. The data is aggregated at the Postcode level. It contains data compiled by the Australian Taxation Office (ATO) and the Australian Bureau of Statistics (ABS). For a good description of the data, look at the URL https://data.gov.au/dataset/govhackato/resource/1f187a2d-df9d-4e1e-871e-2bfcced4a5e4.
From this particular data, I wanted to get an idea on what the current trends of what taxation looks like for Australia.
There are a few items that are referred to in this analysis: - The working population is defined as the age groups between 15 and 64. - The tax definitions and how it is all calculated can be inferred from the tax form “individual-tax-return-2015.pdf” which is part of this pack.
The spreadsheet “atoabsgovhack2017.xlsx” was converted to a CSV file. From the CSV file the dataframe “ato_abs.data” was created. This whole analysis based on that dataframe. Each data point comes from each and every postcode in Australia. Using the document “Postcode_Ranges.doc”, I was able to group each range of postcodes and aggregate the data to their relevant state. The column “State” in the dataframe represents which state that particular postcode belongs to.
Columns from “Income.year” to “HELP.assessment.debt”, contain data sourced from the Australian Taxation Office (ATO). The rest of the data is sourced from the Australian Bureau of Statistics (ABS).
The above graphs show a normal distribution for salaries at an individual level. From looking at the above histograms, 2006 shows a normal distribution and both 2011 and 2015 show a bit more of a positive skew. However, 2011 and 2015 shows a bigger spread of salaries. The histogram for 2015, has shifted to more to the compared to 2006 and 2011. Individuals are getting paid more in 2015 but the distribution looks more spread out and shows a much longer tail. This is confirmed by the boxplot below:
The median salary per individual increased each year. The year 2015 had more for the ACT. This is followed by the Northern Territory (NT). This is rather odd because I would have thought it would have been the most populous states of NSW, VIC or QLD. Not sure about NT, however it is known that the majority of the people in the ACT work for the federal government. This could be taken as a place where individuals can earn more and thereby contribute more to tax revenue.
A more positively skewed distribution is shown for Net tax payable for each year. Interestingly, 2011 shows a majority of individuals paying lower tax than the other 2 years. The histogram for 2015 shows similar pattern to 2006 and the amount of people paying tax more $10000 has increased with a longer tail. The box plot below confirms this.
The above plot shows the number of individuals who submitted an income tax return to the government. It just shows that there is a trending growth upwards each year with the states on the east coast of Australia leading the way i.e. New South Wales, Queensland and Victoria. These 3 states ie New South Wales (NSW), Queensland (QLD) and Victoria (VIC) are the most populous states in Australia and are found on the east coast of Australia. The number of individuals submiting a tax return has risen over the 3 years of income. This gives a feel for the distribution of data in the dataset.
The above bar plot, the data was averaged to the individual level. This was to make the data more readable where numbers weren’t too big.
Surprisingly, the Australian Capital Territory (ACT) and West Australia (WA) have the honor of having on average the most Net tax paid per individual. Even more so, Northern Territory (NT) is pretty much on par with NSW and it has a lower population . Aside from WA, it’s interesting to note that 2011 individuals paid less Tax than in 2006. However, 2016 was higher than 2006 in terms of Net Tax payable per individual. Later on we’ll see correlation between Net.tax and Salary/wages. This brings up both a point and a thought, that the states ACT, WA and NT are the most likely states where individuals can find higher salaries as opposed to other states. Therefore contribute more to the taxation revenue.
I wanted to look at the age distributions across each state and Australia. This was to investigate the distribution on the working population and what impact it could have on the future on both income and the ability to collect tax from working people. It should be noted that the Australian Bureau of Statistics (ABS) considers the working population to be in the range of 15-64 yrs of age. Each age group is in ranges of 5yrs. The data was split into and aggregated up to the different states and then country. To make the data more readible, the y-axis for each bar plot was presented as the percentage of the total population for each state and eventually the country. For the series of bar plots below for each state, the y-axis shows the population as percentage of the State total.
The first observation made was in the increase in the population of people in the age groups of 60 to 74 years of age. From 2006 onwards there’s a been a decrease in the younger age groups. So NSW is experiencing an ageing population and interestingly the box plot looks like it’s bi-modal.
Like NSW, there’s an increase in the population of people in the age groups from 60 to 74. However unlike NSW, there’s been an increase in the population in the age groups from 25 to 34 from 2006 onwards. Still this isn’t enough to offset the growth of the older age groups.
The QLD state seems to show similar properties to NSW in terms of growth in the population of people aged between 60 and 74 yrs of age. There’s no real growth in the younger population age groups.
The ACT has seen growth in the 30 to 34 years old age group. Again, it’s showing growth in the older age groups i.e. people over 60 years of age.
Again the state of South Australia shows an ageing population. In other words there is a growth in population of people in the age groups of 60 to 74. This is compounded by a decline in the younger population of people.
Western Australia is showing an increase in the age groups from 25 to 34. This looks to be however offset by an increased growth in people in the age groups 60 to 74. Interestingly the group in the 25 to 34 year old groups show a trend upwards bigger than the previous states.
There’s been an increase in the 25 to 34 year old age groups. The increase is similar to that of WA. Again this is offset by an increase in the age groups from 60 to 74.
Compared to all the other states, Tasmania has hown a remarkable increase in the 60 to 74 year old age groups. In terms of percentage of the total population for the state, the growth is remarkable when compared to other states.
I had aggregated the bar plots to the state level. This was done to see how each state differed to each other. New dataframes were created to separate the data for each state grouped by year of Income. Interestingly enough, the data for each state showed that they were bi-modal in terms of distribution.
From the Australian Bureau of Statistics (ABS), the working population is defined as 15 to 64 years of age (http://www.abs.gov.au/AUSSTATS/abs@.nsf/Previousproducts/3101.0Feature%20Article1Jun%202016). With the exception of Tasmania and ACT, there has been growth if slightly compared to other states in the age group of 25 to 34 year of age. What does this have to do with taxation? Well if one was looking at preventing the down turn of the Government’s revenue through taxation, then it has to seriously start exploring the increasing the range of the working population ie look at possiby increasing the retirement age from 65 to 75. Alternatively it could look at incentivising younger families to have more children or allowing more young families to migrate to Australia in order to offset the ageing population. The percentage of the total population the age group ranging from 55 to 79 years of age, is growing for each state namely Tasmania and ACT. So the question for both ACT and Tasmania is that, what can they do to attract a younger working population?
Generally across the whole country, there has been increase in the ageing population across Australia. This is backed up by the bar plot for the entire country. This raises some interesting questions such as will Australia need to raise the retirement age from 65?
The correlation between the various age groups and Net tax as well as Salary is quite high. With all the high correlations with each data point, there was no real outcome or analysis that could come out of it. So there was no point in trying to analyse between Net Tax and Salary with each of the age groups.
Were there any meaningful relationships that could be further explored when it with other Taxation-related parameters. From the above correlation matrix, it would be worthwhile to explore Net rental income with both tax and the total deductions. However, before that it would be good to confirm the obvious relationship between Net tax and Salary. Is there anything interesting?
At first glance, there’s definitely a strong relationship between the Salary and the Net Tax. It’s quite clear that the more you earn the more tax you pay. This could be an informative plot however let’s get down to the individual level.
I decided to look at the relationship closer by dividing by the number of individuals aggregated at the postcode level. This was to give me the relationship of Net Tax payable vs Salary at an individual level. Instead of a linear model method for the geom_smooth function, I let the function choose it’s own method of best fit. What’s interesting to note is that up to $90000 salary, the Net Tax payable grows exponentially. However, beyond $90000 salary, the Net tax payable seems to level off around $40000. This could mean that salaries beyond $90000, The Net Tax payable at an individual level seems to level off no matter how high one’s salary is. This could also mean that people earning more than $90000, could be aggresively increasing their tax deductions by being able to afford to. What’s also interesting to note is that the Net tax payable is lower for years 2011 and 2015 than that of 2006. There’s not much of a difference in data between 2011 and 2015 however, salaries seem to have increased between 2011 and 2015.
In the plot, “Net Tax payable vs Total Deductions”, it appears that the Total tax payable starts to taper off for Individuals trying to claim over $7500 in tax deductions. This is interesting because it’s saying that no matter how much one tries to claim in tax deductions, it may not really result in big tax savings.
The above graph plot shows a “-ve” correlation and To create a cleaner plot, I used Net tax as a predictor. It shows that having a loss in rental income can contribute towards a lower Net tax or vice versa. However, for individuals paying between $20000 and $25000 in tax have resulted in the maximum claim of rental loss. After that point, as the individual pays more tax, it’s shown that the rental income increases. This is commonly known as negative gearing in Australia where individuals show a loss in rental income in order to offset against their tax. So higher taxed individuals are those most likely to have positive rental income otherwise known as “postive” gearing. Also observed in the above plot, is that Net rental incomes rose in 2015. This could be attributed to 2 factors of lower interest rates on loans and higher rents for 2015 when compared to 2011.
I picked the 3 popular items that most common tax paying individuals would look at in terms of reducing their tax. I compared them with the Total deductions column in the dataset. Looking at the first of 3 plots, there was no correlation between rental income and the total deductions. I was initially surprised at this but then investigating this further, I found that the Tax Office doesn’t include rental income loss in calculating the Total deductions. So it could explain the very low correlation. However, both “Gifts or Donations” and “Total Work related expenses” are indeed part of the calculation of “Total Deductions” column in the dataset. There was a higher correlation between Gifts/Donations and Total Deductions as opposed to “Total work related expenses” vs “Total Deductions”. Looking closer at those 2 plots, it’s looking like Total Tax deductions have come down over the years whilst there has been slight increases in both giving and claiming more tax related expenses. Could it be a case of the Australian Tax Office being a bit more stringent? This is something that can be investigated further
Below are three plots that I felt were important:
I actually wanted to put all the bar plots for each state. That would have been too much. So instead, I put up the bar plot for all of Australia which sort of best summarises the finding. Compared to 2006, the growth in the population ratio of people in the age groups between 60 and 74 has risen over the years. This growth has been more rapid than any other age group. So what does this underlying trend show? Basically, ageing population means more people may retire and therefore revenue associated with taxation could drop as less people leave the workfore. The government may try to offset this with either raising the retirement age, raise taxes over a period of time or increase the consumption tax otherwise known as the Goods and Services Tax (GST).
This second plot was interesting in how it shows two emerging patterns in my humble opinion. It shows that leading up to approx $75000, the net tax payable increases logrithmically as salary increases. Then after that point, it starts to level off at around $40000 as salary increases. So will that mean as one earns more than $75000 and it increases annually, you will end up paying tax at a lessor rate than those earning less? This could mean one that the rich are less burdened by taxation than those earning less than $75000. Those earning more also benefit by being able to afford tax minimisation strategies by hiring such experts.
This third plot shows rental income vs Net Tax at an individual level. It’s noteworthy in how it’s a negative relationship as Net Tax payable increases, the net rental income after expenses drops. However as the individual has a Net tax > $30000, It shows that the rental income has a tendency to increase. This is a strategy for high income earners to try to minimise their tax payments by showing a loss in terms of rental income from their investment property portfolio.
This strategy as mentioned earlier is known as negative gearing. It’s used by individuals on higher salaries. It’s also interesting to note that rental income has increased from 2011 to 2015. This increase is a result of interest rates on the loans dropping coupled with higher rents being charged by investors.
In looking at the data, what I’ve been able to glean from the data are the following insights: - Australia is starting to have an ageing population. This could prove to be an issue for the Federal government in the next few years as tough decisions will need to be made and will the population accept it. - When Net Tax and Salary is shown to have a very high correlation when aggregated at the postcode level. However when taken down to the individual level, some interesting patterns start to come up. Up to approx $75000 in Salary, It looks to be a linear relationship however beyond that, it starts to level off. This could either mean one of two things that the tax system could be beneficial to high earning individuals in allowing them to take home more pay or it could mean that high tax individuals are able to afford to pay for taxation advice that will result in more tax offsets. - A very popular way of reducing tax as individuals have higher salaries, is to take advantage of rental income loss in order to lower their payable tax. This is known as negative gearing. This brings about a strong culture of buying investment property for higer income individuals. - Surprisingly, individuals earning $90000 and more, seem to claim less than $5000. I would have thought it would be more. Maybe they look claim other tax offsets that fall outside the realm of direct tax deductions. See the pdf <> for details on how tax deductions are calculated. - A trend that was found that rental incomes from investment property increased from 2006 to 2015. Another interesting observation was that Salaries also increase. These increases seem small and could even suggest to keep up with inflation. Additionally, tax deductions have decreased slightly over the years. - 2011 is a very interesting year. At certain data points it bucks the trend e.g. the plot “Net Tax per Individual for each Year - Boxplot” where the Median Net Tax payable is lowere than 2006 and 2015. I would use this as an example as to why further insight is required to investigate why.
The main issues I found with this dataset was: - I felt the data it was aggregated too much. Take for example, just because a particular postcode could have more of a North Western European ethnic group, it doesn’t necessarily mean that particular ethnic group contributed to higher salaries and thereby contributing more to tax. I felt that there quite a bit of those situations throughout the dataset. - It would have been better to include job categories and combine that together with the Ethnic groups. Additionally, with job categories, it would have been nice to - Only 3 years of data was provided which were 2006, 2001 and 2015. It would have been better to have data from 2006 till 2017 incrementing by 1 year. This could have been better to get a closer at the trends in my opinion. in-between years as well as more up to date data ie 2017. - I felt that I couldn’t do much analysis with data associated with both ethnicity and marriage status and fortnightly income columns. I couldn’t see any meaningful correlations or relationships in trying to
So what can be done further with this dataset? I’m not sure that this dataset could be used to gain further insights for the reasons mentioned above. However, as I was researching on the internet for other datasets available in this area that could be used for further insights, I had come across eechidna. (URL: https://cran.r-project.org/web/packages/eechidna/index.html). Eechidna would be a good dataset to use to drill into the areas that were identified in this analysis in conjunction with further data from the Australian Taxation Office (ATO). To conclude